Learning Bimodal Structure in Audio–Visual Data
نویسندگان
چکیده
منابع مشابه
Learning Bimodal Structure in Audio-Visual Data
A novel model is presented to learn bimodally informative structures from audio-visual signals. The signal is represented as a sparse sum of audio-visual kernels. Each kernel is a bimodal function consisting of synchronous snippets of an audio waveform and a spatio-temporal visual basis function. To represent an audio-visual signal, the kernels can be positioned independently and arbitrarily in...
متن کاملBimodal Recurrent Neural Network for Audiovisual Voice Activity Detection
Voice activity detection (VAD) is an important preprocessing step in speech-based systems, especially for emerging handfree intelligent assistants. Conventional VAD systems relying on audio-only features are normally impaired by noise in the environment. An alternative approach to address this problem is audiovisual VAD (AV-VAD) systems. Modeling timing dependencies between acoustic and visual ...
متن کاملAnatomical Structure Sketcher for Cephalograms by Bimodal Deep Learning
Lateral cephalogram X-ray (LCX) images are essential to provide patientspecific morphological information of anatomical structures. The automatic annotation of anatomical structures in cephalograms has been performed in the biomedical engineering for nearly twenty years. Most systems only handle a portion of salient craniofacial landmark set [1, 2, 3]. Although model-based methods can produce a...
متن کاملMaximum Covariance Unfolding : Manifold Learning for Bimodal Data
We propose maximum covariance unfolding (MCU), a manifold learning algorithm for simultaneous dimensionality reduction of data from different input modalities. Given high dimensional inputs from two different but naturally aligned sources, MCU computes a common low dimensional embedding that maximizes the cross-modal (inter-source) correlations while preserving the local (intra-source) distance...
متن کاملLearning Multi-modal Dictionaries: Application to Audiovisual Data
This paper presents a methodology for extracting meaningful synchronous structures from multi-modal signals. Simultaneous processing of multi-modal data can reveal information that is unavailable when handling the sources separately. However, in natural high-dimensional data, the statistical dependencies between modalities are, most of the time, not obvious. Learning fundamental multi-modal pat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Neural Networks
سال: 2009
ISSN: 1045-9227,1941-0093
DOI: 10.1109/tnn.2009.2032182